Combined Models for Topic Spotting and Topic-dependent Language Modeling

نویسندگان

  • Brigitte Bigi
  • Renato De Mori
  • Marc El-Bèze
  • Thierry Spriet
چکیده

A new statistical method for Language Modeling and spoken document classification is proposed. It is based on a mixture of topic dependent probabilities. Each topic dependent probability is in turn a mixture of n-gram probabilities and the probability of Kullback-Lieber (KL) distances between key-word unigrams and distribution obtained from the content of a cache memory. Experimental result on topic classification using a corpus of 60 Mword from the French newspaper Le Monde show the excellent performance of the cache memory and its complementary role in providing different statistics for the decision process.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dialogue Speech Recognition by Combining Hierarchical Topic Classification and Language Model Switching

An efficient, scalable speech recognition architecture combining topic detection and topic-dependent language modeling is proposed for multi-domain spoken language systems. In the proposed approach, the inferred topic is automatically detected from the user’s utterance, and speech recognition is then performed by applying an appropriate topic-dependent language model. This approach enables user...

متن کامل

A frame and segment based approach for topic spotting

In this paper we present a new approach for topic spotting based on subword units (phonemes and feature vectors) instead of words. Classi cation of topics is done by running topic dependent polygram language models over these symbol sequences and deciding for the one with the best score. We trained and tested the two methods on three di erent corpora. The rst is a part of a media corpus which c...

متن کامل

Improved topic-dependent language modeling using information retrieval techniques

N-gram language models are frequently used by the speech recognition systems to constrain and guide the search. N-gram models use only the last N-1 words to predict the next word. Typical values of N that are used range from 2-4. N-gram language models thus lack the long-term context information. We show that the predictive power of the N-gram language models can be improved by using long-term ...

متن کامل

Dynamic Nonlocal Language Modeling via Hierarchical Topic-Based Adaptation

This paper presents a novel method of generating and applying hierarchical, dynamic topic-based language models. It proposes and evaluates new cluster generation, hierarchical smoothing and adaptive topic-probability estimation techniques. These combined models help capture long-distance lexical dependencies. °Experiments on the Broadcast News corpus show significant improvement in perplexity (...

متن کامل

A Frame and Segment Based Approach for Topic

In this paper we present a new approach for topic spotting based on subword units (phonemes and feature vectors) instead of words. Classiication of topics is done by running topic dependent polygram language models over these symbol sequences and deciding for the one with the best score. We trained and tested the two methods on three diierent corpora. The rst is a part of a media corpus which c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997